Name | Version | Summary | date |
kreuzberg |
3.15.0 |
Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats |
2025-09-14 18:14:57 |
pdf2markdown |
0.3.0 |
Python library and CLI tool that leverages LLMs to convert technical PDF documents to well-structured Markdown |
2025-09-14 02:02:58 |
qdrant-loader |
0.7.3 |
A tool for collecting and vectorizing technical content from multiple sources and storing it in a QDrant vector database. |
2025-09-11 07:33:39 |
bank-statement-separator |
0.3.0 |
AI-powered tool for separating multi-statement PDF files using LangChain and LangGraph |
2025-09-10 14:48:39 |
docstrange |
1.1.6 |
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR. |
2025-09-10 09:27:30 |
docling-onnx-models |
0.1.3 |
ONNX Runtime implementations for Docling AI models |
2025-09-09 08:45:47 |
mseep-kreuzberg |
3.13.4 |
Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats |
2025-09-09 03:44:56 |
pydatamax |
0.2.0 |
Advanced Data Crawling and Processing Framework |
2025-09-03 17:39:42 |
docuglean-ocr |
1.0.0 |
An SDK for intelligent document processing using SOTA VLLM models |
2025-09-02 13:19:12 |
contextgem |
0.18.0 |
Effortless LLM extraction from documents |
2025-09-01 21:07:54 |
docx-mcp |
0.1.4 |
DOCX MCP处理器 - 完整的Word文档处理工具,支持图片编辑和表格操作 |
2025-08-31 18:11:33 |
dddocr-py |
0.1.0 |
Python client for the 3DOCR.com OCR API |
2025-08-30 19:16:43 |
wizarddocx |
1.0.0 |
Text extraction from Microsoft Word files. Parses Word documents natively and can optionally run local OCR with Tesseract for embedded images or scanned pages. Supports page selection and bytes input. Legacy .doc is read-only and OCR is not available. |
2025-08-28 09:27:49 |
mcp-gosling |
0.1.0 |
MCP Gosling - Advanced document processing server for Goose AI using IBM's Docling library |
2025-08-25 02:12:32 |
smartloop |
1.3.2 |
Smartloop Command Line interface to process documents using LLM |
2025-08-24 17:55:11 |
ocr-detection |
0.4.1 |
A Python library to detect whether PDF pages contain extractable text or are scanned images requiring OCR |
2025-08-22 07:27:10 |
qagen |
0.1.1 |
A powerful Chinese document QA pairs generation and validation tool with multiple LLM support |
2025-08-21 10:17:34 |
inkognito |
0.1.0 |
Privacy-first document processing FastMCP server with PII anonymization |
2025-08-13 17:45:52 |
xml-analysis-framework |
1.4.4 |
XML document analysis and preprocessing framework designed for AI/ML data pipelines |
2025-08-12 04:21:41 |
raggy |
0.3.5 |
scraping stuff |
2025-08-11 14:49:05 |